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Abstract 

The effectiveness of direct measurement techniques and 
standardized achievement tests for assessing within-individual change 
over a 10-week period was examined. The Reading Comprehension and 
Language subtests from the Stanford Achievement Tests and direct 
measures of reading and written language were administered twice to 83 
low-achieving students in grades 3-6. Analyses indicated that greater 
student gains were evident on the direct measures than on the 
standardized achievement test. 



A Comparison of Standardized Achievement Tests and Direct 

Measurement Techniques in Measuring Pupil Progress 
How is the progress of exceptional children receiving special * 
education services best measured? Traditionally, standardized 
achievement tests have been preferred for the assessment of student 
growth (Mehrens & Lehmann, 1973; Stanley & Hopkins, 1972). Student 
improvement, or lack of it, on such tests is perceived as an index of 
how the pupil is progressing in his or her school situation. On the 
basis of this information, much of what happens to a student served in 
special education is determined, including: program placement, 
program planning, and exit from special services. However, it can be 
argued that norm-referenced, standardized achievement tests do not 
effectively measure student learning (Carver, 1974). 

The inadequacy of standardized achievement tests for measuring 
student change is related to three factors. First, norm-referenced 
achievement tests are designed primarily to measure individual 
differences, not changes in learning (Hively & Reynolds, 1975) . 
Scores from these tests may be interpreted only in relation to the 
performance of others, and cannot be used for within-individual 
comparisons. Furthermore, Carver (1974) noted that a single test 
cannot fulfil 1 both responsibilities. For instance, the most 
efficient item for a norm-referenced test is one that has a passing 
proportion of .50 (p = .50), which maximizes the population variance. 
However, an assessment procedure that best measures learning would 
have items with p values near .00 before educational intervention and 
approaching 1.00 after treatment. .Thus, psychometrically sound, norm- 
referenced tests are not the optimal methodology for measuring 
individual student learning. 



A second reason for' dissatisfaction with standardized tests as 
measures of progress relates to the sensitivity of these devices in 
iP:*asuring what the student is taught. Jenkins and Pany (1978) 
demonstrated that standardized tests of reading achievement 
differentially sample the content of frequently used reading 
curricula. Examination of their hypothetical data should alert 
special . educators to the distinct possibility that measurement of 
student growth is a function of the test used and does not necessarily 
reflect true changes in pupil performance. Lovitt and Eaton (1972) 
cited actual case data that corroborate this conclusion. 

Third, the use of grade equivalent scores, a common practice with 
standardized achievement tests, is problematic in measuring student 
progress. It is well documented that grade equivalent scores are not 
expressions of equal interval units (Salvia & Ysseldyke, 1981; 
Thorndike & Hagen, 1969). Consequently, the aggregation or averaging 
of these scores in evaluating, the progress of the special education 
pupil must be viewed as highly suspect. 

Carver (1974) proposed that alternatives to norm-referenced 
psychometric methods must be developed if pupil progress is to be 
validly indexed. He labeled this new type of assessment as edumetric 
measurement, and emphasized the need for technically adequate 
procedures to evaluate within-individual gain. Recognizing the need 
for an edumetric approach, Jenkins, Deno, and Mirkin {l^'^9) outlined 
the desirable characteristics of such a measurement system: it must 
be relevant to the child's curriculum, sensitive to growth, flexible 
and adaptive to various instructional objectives, repeatedly 
administrable, and easily administrable. 
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Several investigators in the field of special education have 
developed measurement systems focusing on the assessment of withir- 
individual change (Deno & Mirkin, 1977; Lindsley, 1971; White & 
Haring, 1980). Recent research with these methods, often referred to 
as direct measurement techniques, has shown that it is possible to 
validly measure student behaviors in the classroom and satisfy most of 
the desired characteristics outlined by Jenkins et al. (1979). In the 
area of reading, Deno, Mirkin, and Chiang (1982) demonstrated that a 
student's oral reading rate on a passage from his or her basal reader 
or on a list of words from the reader correlated highly with 
standardized achievement tests of decoding (r = .90) and reading 
comprehension (r = .80). In a similar study, focusing on written 
language skills, Deno, Marston, and Mirkin (1982) examined the written 
compositions of normr.l and learning disaoled elementary students and 
found that the. number of words written, the number of wprds spelled 
correctly, and the number of correct letter sequences written all 
correlated highly with standardized tests of written language 
achievement (range = .70 - .86). 

Such investigations are a necessary Drecondition in the 
establishment of direct measurement techniques- as technically adequate 
.(American Psychological Association, 1974). Another issue remains, 
however--the issue of the measures* Sensitivity to growth. The 
capability of standardized achievement tests to monitor student change 
accurately has been seriously challenged (Hively & Reynolds, 1975). 
If direct measurement strategies are to be considered bona fide 
edumetric assessment procedures, their sensitivity to monitoring 
short-term progress must be substantiated. 

8 
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It is the purpose of this paper to examine the effectiveness of 
direct measurement techniques to assess within-individual change in 
contrast to standardized achievement tests. Specifically, the study 
focuses upon the capacity of standardized achievement tests and direct 
measurement techniques tc measure pupil progress in children with 
learning difficulties. A 10-week period was selected as the interval 
in which student progress in reading and written language would bo 
assessed. Central to the analysis is the assumption that the 
measurement approach most sensitive to student growth would show the 
greatest pupil gains. 

Method 

Subjects 

Low-achieving elementary students from grades 3-6 participated in 
the study. These students attended three schools located in a rural, 
midwestern area. A measure of written expression, validated by Deno, 
Marston, and Mirkin (1982), was used to select the low-achieving 
students from a total population of 785 pupils. All students were 
asked to write two compositions, with the total number of words 
written on the second composition tallied. Those students who 
performed at or below the 15th percentile for their grade level were 
invited to participate in the study. Parental permission was received 
for 83 pupils. The sample sizes, means, 15th percentile cutoff 
scores, and distribution of sexes for each grade level are presented 
in Table 1. 
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Insert Table 1 about here 



Procedures 

A set of standardized achievement tests and direct measurement 
tasks were administered twice to the 83 students. The assessment 
procedures were administered in the first week of October and 10 weeks 
latlsr, in December. 

The standardized achievement tests administered to the students 
were the Reading Comprehension and the Language subtests from the 
Stanford Achievement \Tests (Madden, Gardner, Rudman, Karlsen, & 
Merwin, 1978). Becausf? the grade levels of the students ranged from 
third to sixth gradfe, four different forms of the SAT were 
administered: Primary II-A, Primary III-B, Intermediate I-A, and 
Intermediate II-B. The scores obtained for each subtest were raw 
score, scaled score, grade equivalent, and percentile. 

The direct measure of reading used in the study was derived from 

Deno, Mirkin, and Chiang (1982). A list of words was selected 

randomly from the third grade level of the Harris-Jacobson Word List 

(Harris & Jacobson, 1972) and used for the readingTasl<> Each student 

was asked to read aloud for one minute. Test instructions read 

verbatim to J:he subject were: 

Here is a word list that I want you to read. When I tell 
you to start, you can read across the page. Please read as 
fast and accurately as you can. If you get stuck on any of 
the words, move on to the next one. I will tell you when to 
stop reading. Are there any questions? Ready? Begin.' 

The child then was timed for 60 seconds while the examiner followed 

along on a recording sheet identical to the student's list, recording 
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the mistakes. Tf a student did not resDond after approximately six 
seconds, he^ or she was told to move on to the next word. For each 
student the number of words read correctly (WRC) was scored. 
Estimates of inter-rater agreement ranged from .94 to .98. 

In addition to reading the third grade lists, the fourth, fifth, 
and sixth graders were asked to read a list of words produced from 
their grade level from the Harris-Jacobson list. For exr- -»le, the 

fifth graders read both a third grade list and a fifth grade list. 

I 

For each student t\]e number of words read correctly from grade level 
(WRCG) was counted. 1 

The measure of written language employed in this study was based 
upon the research of^ Deno, Marston, and Mirkin (1982). Pupils were 
administered the same story starter at weeks 1 and 10; each time, they 
were given three minutes to complete this task. For each student, the 
total number of words written (TWW), the number of words written 
(spelled) correctly iwWC), and the number of correct letter sequences 
(CLS) were computed. Inter-scorer rel iabil ity coefficients ranged 
between .91 and .96. 
Analysis 

Two different analyses were conducted upon the data. Analysis I 
focused upon the amount of change from week 1 to week 10 on the 
standardized and direct measures. Using a paired t_ test analysis for 
each measure, a t_ value was computed and interpreted as representative 
of the amount of change that each test measured. While Analysis I 
provided an interpretation of how much growth is evident for each 
measure, it did not provide a direct comparison of standardized tests 



and direct measurement techniques. Analysis II was designed to 

produce this comparison by contrast ing student change on direct 

measures with change on the standardized tests. However, assessment 

of improvement between weeks 1 and 10 on the measures was not made 

with equivalent units. Derived scores from the SAT included raw 

scores, scaled scores, grade equivalents, and percentiles. All scores 

,on the direct measures were in raw score units. To remedy this 

situation, a modification of Glass' (1978) Effect Size analysis was 

used. Thus, for each student an Effect Size (ES) was calculated by 

subjecting the week 10 score to a type of z score transformation using 

week 1 as a referent. 

In to the following formula, ES is a student's standardized week 

10 score, X2 is the observation at week 10, "Xi is the mean of all 

students at week 1, and SDi is the standard deviation of all students 

at week 1: _ 

- X, 

FS = — = — 

Thus, the student's growth is determined by the ratio of his or her 
deviation from the week 1 mean and the standard deviation of week 1. 
If there is little or no change between weeks 1 and 10 on the measure, 
the student's ES approaches zero. However, if the student performed 
better at week 10, the ES should be greater than 0 (conversely, if 
week 10 scores were lower than week 1, ES would be less than 0). The 
transformation provides an index of student growth relative to initial 
performance that is directly comparable across measures. Once Effect 
Size scores were computed for each .student on each assessment 
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procedure, it was possible to compare student change performance on 
standardized tests and a direct measure. In Analysis II, the contrast 
of standardized and direct measures was achieved with paired t_-test 
analysis with Effect Size as the dependent variable. 

Results 

Analysis I "^"^^ 
Table- 2 is a summary of the t values comparing the scores of 
weeks 1 and 10 for the standardized measures. For the Reading 
Comprehension subtest from the SAT, t values ranged from 2.17 to 2.55; 
all were significant (p < .05). Change was not apparent on the 
Language subtest from the SAT; t_ values comparing performance at weeks 
1 and 10 ranged from .39 to 1.63, all statistically nonsignificant. 
As is evident in Table 3, t values for the direct measures were much 
larger; all were significant at the .001 level. The greatest change 
in performance was reflected in words read correctly (WRC: t = 11.74, 
p < .001). The smallest t^ value was found for words read correctly 
from grade level (WRCG: t = 3.69, p < .001). 

Insert Tables 2 and 3 about here 



Analysis II 

Paired t^ test analyses comparing the mean student Effect Sizes 
for reading measures arV.^presented in Table 4. For the area of 
reading, the student ESs for worlds read correctly (WRC) were 
significantly greater than all Reading Comprehension ESs (p < .001). 
Effect Sizes for words read correctly j from grade leveM^WRCG) and SAT 



reading scores were not significantly different. In written language, 
total number of words written (TWW), number of words written correctly 
(WWC), and number of correct letter sequences (CLS) student Effect 
Sizes were all significantly greater than the SAT Language Effect 
Sizes (p < .001). These values are presented in Table 5. 



Insert Tables 4 and 5 about here 



Discussion 

The measurement of pupil progress is a significant issue for 
those responsible for the delivery of special education services. PL 
94-142 mandates that an Individual Educational Plan (lEP) be written 
for each handicapped child; the lEP includes the specification of 
goals and objectives related to the pupil's instructional needs. ^ As 
Jenkins et al. (1979) pointed out, the implementation of such a system 
should "raise our sensitivities about the need to develop satisfactory 
procedures for measuring children's progress" (p. 82). Yet, debate 
continues over the appropriateness of standardized achievement 
measurement as the primary methodology for monitoring a student's 
progress over brief time intervals. 

"Direct measurement techniques are perceived by many as a viable 
alternative. Although the study of these techniques has been 
initiated only recently, it appears that the measures are valid with 
resoect to APA Standards (Deno, Mirkin, & Chiang, 1982; Dene, Marston, 
& Mirkin, 1982). The analyses presented here provide, preliminary 
evidence that direct measurement techniques are more sensitive to 
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short-term growth in pupils with learning difficulties than are 
standardized achievement tests. In reading, the student Effect Sizes 
for oral fluency were significantly greater than the gains students 
made on tbe SAT Reading Comprehension subtest. Similarly, direct 
measures of written expression were much more sensitive to pupil 
progress over 10 weeks than the SAT Language subtest, on which 
virtually no growth was evident. 

While the Effect Size analysis dramatically supports the notion 
that direct measurement is more sensitive to growth, the conclusion 
must be tempered by the absence of an external criterion for student 
improvement. It is plausible that student growth did not occur over 
the 10-week period, and that standardized achievement tests more 
accurately detected this phenomenon. However, this argument is based 
on the notion that students improved very little over a. 2^ month 
period, an event that seems improbable, even for low-achieving pupils. 
Regardless, future research in this area must attend to this 
methodological deficiency. 

Of greater concern may be the use of a pre-post test design to 
study pupil change in performance. The reliability of such change 
scores has been debated (Cronbach & Furby, 1970). In practice, this 
criticism would have little effect on direct measurement , for in 
addition to being a measurement system more closely linked to the 
student^s curriculum, it also is a system based /upon repeated 
measurements and not the pre-post test design. As Nunnally (1967) 
noted, repeated observations increase reliability. Standardized 
achievement tests, on the other hand, can only be used in pre-post 
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test designs since they are not designed to used on a repeated and 
frequent basis. 

In summary, the needs of special educators in measuring pupil 
progress for lEP goals and objectives may be fulfilled by the use of 
direct measurement'' techniques. The preliminary research on pupil 
progress measurement presented here supports the contention that 
direct measures are preferable to standardized achievement tests. 
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j Table 1 

Descriptive Data from Subject Selection Procedure 



Grade 


\ • Number of 
; Students Screened 


15th 

Mean 


Percentile 
Cutoff 
Score 


Sex • 
Distribution 
Male Female 


3 


190 


17.7 


9.0 


\ 

14 


12 


4 


185 


22.7 


12.0 


10 


7 


5 


225 


30.5 


19.0 


14 


\ 5 

\ 

\ 


6 


185 


36.4 


24.0 


13 





I 

/ 



\ 

\ 
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Table 2 ' 

Paired t-test Comparison of Student Performance at Week* 1 and 1 
on Reading Comprehension and Language Subtests of the SAT 





Time of 
Assessnien u 


Mean 


Standard 

UcV^I a L 1 Uil 


t val 


leading Comprehension 
Kaw ocore 


Week 1 


41 .4<L 
4"? 74 


21 .7 
21 .3 


2.3 


leading Comprehension 
Scaled Score 


Week 1 
Week 10 


152.56 
155.92 


19.2 
17.0 


2.5 


leading Comprehension 
Grade Equivalents 


Week 1 
Week 10 


5.05 . 
5.31 


1 .9 
1 .8 


2.1 


leading Comprehension 
Percentile Ranks 


Week 1 
Week 10 


52.08 
56.19 


26.5 
26.0 


2.1 


.anguage 
Raw Score 


Week 1 
Week 10 


35.87 
34.40 


12.9 
12.2 


-1 .C 


.anguage 
Scaled Score 


Week 1 
Week 10 


149.47 
148.56 


25.2 
22.0 




.anguage 
Grade Equivalents 


Week 1 
Week 10 


4.77 
4.66 


2.1 
2.0 


- •£ 


Language 
Percentil e Ranks 


Week 1 
Week 10' 


50.26 
45.19 


27.0 
26.8 


-l.( 



\ 



\ 




Table 3 

Paired t-test Comparison of Student Performance at Weeks 1 and 10 
on Direct Measures of Reading and Written Expression 





Time of 
Assessment 


Mean 


Standard 
Deviation 


t value 


Probabil ity 


Words Read Correctly 
3rd Grade Level 


Week 1 
Week 10 


46.85 
60.71 


26.1 
28.8 


11 .74 


.001 


Words Read Correctly 
Grade Level 


Week 1 
Week 10 


30.33 
37.00 


20.1 
25.3 


3.69 


.001 


Total Words Written 


Week 1 
Week 10 


26.23 
34.65 


9.1 
12.2 


6.28 


.001 


Words Written Correctly 


Week 1 
Week 10 


23.65 
31.01 


9.0 
11 .8 


5.86 


.001 


Correct Letter Sequences 
for Writing Task 


Week 1 
Week 10 


110.96 
146.40 


'41.6 
54.7 


6.37 


.001 
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Table 4 

Comparison of Student Effect Sizes on SAT Subtests and Direct Measures in Reading* 



Mean Student 

Measures Compared^ Effect Size 


Standard 
npw i i"! n n 

U C V 1 il l« 1 U 1 1 


t value 


Probabil i ty 


DM: Words Read Correctly (3rd Grade) 


.68 


1 .1 


3.73 


.001 


ST: Reading Comprehension (Raw Score) 


.13 


1 .2 






DM: Words Read Correctly (3rd Grade) 


.70 


1.1 


4.16 


.001 


ST: Reading Comprehension (Scaled Score) 


.16 


1 .0 






DM: Words Read Correctly (3rd Grade) 


.70 


1.1 


4.20 


.001 ' 


ST: Reading Comprehension (Grade Equivalent) 


.15 


1 .0 






DM: Words Read Correctly (3rd Grade) 


.70 


1 .1 


.3.98 


.001 


ST: Reading Comprehension (Percentile) 


.18 


1.0 






DM: Words Read Correctly (Grade Level ) 


.32 


1.2 


1 .02 


.313 


ST: Reading Comprehension (Raw Score) 


.15 


1.0 






DM: Words Read Correctly (Grade Level) 


.29 


1 .2 


.57 


.575 


ST: Reading Comprehension (Scaled Score) 


.21 


.8 






DM: Words Read Correctly (Grade Level) 


.29 


1.2 


.82 


.415 


ST: Reading Comprehension (Grade Equivalent) 


.17 


.9 






DM: Words Read Correctly (Grade Level) 


.29 


1.2 


.80 


.412 


ST: Reading Comprehension (Percentile) 


.18 


. 1.0 






%M represents direct measurement 
ST represents standardized achievement test 
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Table 5 

Comparison of Student Effect Sizes on SAT Subtest and Direct Measures in Written Language* 



Measures Compared 


Mean Student 
Effect Size 


Standard 
Deviation 


t value 


Probability 


Uil: 
ST: 


Language (Raw Score) 


. V.2'6 
-.19 


1.7 

.1 .1 


4.93 


.001 


OM: 
ST- 


TOuai woros written 
Language (Scaled Score) 


1 .26 

- . I U 


1.7 


4.69 


.001 


OM: 
ST: 


Total Words Written 
Language (Grade Equivalent) 


1 .26 
- .08 


1.7 

1 . * 


4*. 62 


.001 


OM: 
ST: 


Total Words Written 
Language CPercantile) 


1.26 
•■ -.38 


1.7 
1.2 


"5.63 


.ooi: 

1 


CM: 
ST: 


Words Written Correctly 
Language O^aw Score} 


.94 
-.19 


1.6 
1.1 


4.13 


.001 


CM: 
ST: 


Words Written Correctly 
Language (Scaled Score} 


.94 
-JO 


1.6 
1 .0 


3.84 


.001 


DM: 
ST: 


V/ords Written Correctly 
Language (Grade Equivalent! 


.94 
-.08' 


1.6 
1.1 


3.73 


.001 


DM: 
ST: 


Words Written Correctly 
Language (Percentile) 


.94 
-.33 


1.6. 
1.2 


4.80 


.001 


OM: 
ST: 


Correct Lettsr Sequences 
Language (Raw Score) 


1.15 
-.19 


1.8 
1,1 


4.40 


.001 


OM: 
ST: 


Correct Letter Sequences 
Language (Scaled Score) 


1.15 
-.10 


1,3 
1.0 


4,10 


.001 


OM: 
ST: 


Correct Letter Sequences) 
Language (Grade Equivalent) 


1.15 

-.ca 


1.3 
1.1 


4.02 


.001 


OM: 
ST: 


Correct Latter Sequences) 
Language (Percentile) 


1 .15 
-.38 


1.8 
1.2 


4,^0 


.001 



*DM represents direct measurenient 
ST represents standardized achievement test 
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